Scalable reduction of large datasets to interesting subsets
نویسندگان
چکیده
With a huge amount of RDF data available on the web, the ability to find and access relevant information is crucial. Traditional approaches to storing, querying, and inferencing fall short when faced with web-scale data. We present a system that combines the computational power of large clusters for enabling large-scale inferencing and data access with an efficient data structure for storing and querying this accessed data on a traditional personal computer or smaller embedded device. We present results of using this system to load the Billion Triples Challenge dataset, fully materialize RDFS inferences, and extract an “interesting” subset of the data using a large cluster, and further analyze the extracted data using a traditional personal computer.
منابع مشابه
Size Matters – Revealing Small Scale Structures in Large Datasets
The size of datasets generated in the medical imaging community is increasing faster than additional processing resources are made available. Even if we can surmount the hurdles in large data handling and processing, the amount of information encoded in these datasets is overwhelming. Therefore effective visualization techniques must allow a user to identify and focus on scientifically interest...
متن کاملIdentifying Information-Rich Subspace Trends in High-Dimensional Data
Identifying information-rich subsets in high-dimensional spaces and representing them as order revealing patterns (or trends) is an important and challenging research problem in many science and engineering applications. The information quotient of large-scale high-dimensional datasets is significantly reduced by the curse of dimensionality which makes the traditional clustering and association...
متن کاملScalable Image Annotation by Summarizing Training Samples into Labeled Prototypes
By increasing the number of images, it is essential to provide fast search methods and intelligent filtering of images. To handle images in large datasets, some relevant tags are assigned to each image to for describing its content. Automatic Image Annotation (AIA) aims to automatically assign a group of keywords to an image based on visual content of the image. AIA frameworks have two main sta...
متن کاملDynamic Data Citation
Being able to reliably and efficiently cite entire or subsets of data in large and dynamically growing or changing datasets constitutes a significant challenge for a range of research domains. Current approaches rely on pointers to entire data collections or on explicit copies of data. They do not scale with large quantities of data. Hence a new method is required that enables to create, refere...
متن کاملSample-oriented Domain Adaptation for Image Classification
Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Web Sem.
دوره 8 شماره
صفحات -
تاریخ انتشار 2010